-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-105201: Add PyIter_NextItem to replace PyIter_Next which has an ambiguous output #105202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… an ambiguous return value
Doc/c-api/iter.rst
Outdated
while ((item = PyIter_Next(iterator))) { | ||
PyObject *item; | ||
int res; | ||
while ((res = PyIter_NextItem(iterator, &item)) == 0 && item != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better if PyIter_NextItem
returned -1 for error, 0 for exhausted iterator and 1 if there is a 'next' value. Then we could just continue as long as it return 1 (without checking for NULL).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal preference would be to stick with the C idiom of returning -1 or 0. I also think that the API design should not be discussed in a conversation on a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After trying to migrate a few places, I agree. Working with 3 return values is not simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I do think it should be PyObject* PyIter_NextItem(PyObject *o, int *err)
, otherwise we need two checks in every loop of iteration and that's just wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need two checks, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not on every iteration. Only when item it NULL and you exit the loop you need to see what err is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the return value is still ambiguous. I thought the aim of the new API was an unambiguous return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aim is that you can find out whether an error occurred without calling PyErr_Occurred().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting from the issue:
We will try to move away from those APIs to alternative ones whose return values non-ambiguously indicate whether there has been an error.
Personally, I don't find the new API to be an improvement. While the overall caller logic would have the same structure, it takes an additional line to create the variable, adds pointer logic in place of the cleaner looking function call, and it creates an opportunity to lose track of where the error came from. I really don't think we need this. Also, it is somewhat aggressive to label the long-standing API as "legacy" as if it is wrong or hard to read in some way. Likewise it is aggressive to label the new API as "preferred". That will only induce some new core dev to sweep through and replace every existing call even though the current code works just fine and the existing API will never go away (at least not without creating a ton of unnecessary work for the Python ecosystem). |
The opposite: currently you need to check PyErr_Occurred() so you can’t be sure where the error came from. With this change you know. |
I'm still -1, as explained in the issue. Also, the PR title is misleading; you're replacing one API with an ambiguous return value with another API with an ambiguous return value. It ends up adding yet another problematic API to the list of ambiguous APIs. |
Where is the ambiguity? The return value is the pair (PyObject*, int). Everything you need it provided by the function, so you don’t need to access global state. |
I reworded the title to ‘ambiguous output’ rather than ‘ambiguous return value’. |
The existing API has been used successfully for two decades and during that time I've not seen a single user request for an alternative API. There isn't a real user issue being solved here. ISTM that this is an invented problem and that the solution is in some ways worse than what we have now. In addition, having more than one way to do it will create more problems than it solves. Since this is part of the stable ABI, the new and old functions will live in perpetuity and create be a recurring source of confusion. Also, the new API will not be usable by any package wanting to be compatible will versions of Python before 3.12. Having a mix of techniques is an undesirable outcome. |
The issue of c api functions with ambiguous return values came up in the discussions that started at the recent language summit, about the problems with the c api (see the blog post about that). The problem is not only related to PyIter_Next usage in cpython, but to how the c api works for alternative python implementations (that don’t use cpython’s internal error handling mechanism.) There was some discussion about whether we can fix problems like this incrementally, or we just need to give up on the current api and redesign a new one from scratch. I believe it’s always better to fix things incrementally when you can, and redesign from scratch only when you have to (the redesign can then focus on the problems that actually require it). Other core devs said it is not possible to fix the current c api, and I see now (almost at the first hurdle) what they mean. This probably means that we will have a very inflated list of problems “requiring” a new c api. Not because that is the right way to go about this, but because we won’t be able to get through discussions like this to make incremental fixes. It probably won’t make much of a difference to the new c api (there will likely be a new one either way because not all problems can be fixed incrementally in the current one). But I do think it’s sad if the current c api (which will be with us for a long time) can’t evolve because most core devs have given up on that. And I certainly understand now why they have. |
Reopening following discussion on the issue. |
I strongly prefer |
@rhettinger @erlend-aasland The existence of this state means we need to check that the VM is in a valid state at C API boundaries. Currently there are four possible states after calling an API function that may fail (which is all but a few):
The first two of those are correct, but even then the second case requires the caller to handle the exception, or it leaves the VM in an invalid state. We have to constantly check that the exception is not set, because if it were and we call If the return values all C API functions (and C extension code) were unambiguous, then error handling would become (mostly) stateless, making the VM more robust and calling into C extensions potentially faster, as none of the resulting states can leave the VM in a invalid state.
The goal is that Here's an example. Suppose we are iterating over two iterators at the same time (like in
But we forget to check the error case of
but we do check
|
There is an efficiency issue here, as well. If we change the underlying protocol to match that of int
PyIter_NextItem(PyObject *iter, PyObject **next)
{
return (*Py_TYPE(iter)->tp_iternextitem)(iter, next);
} Note that if |
PyIter_Next
returns NULL for both normal exit and error. The caller needs to callPyErr_Occurred()
to disambiguate. This PR addsPyIter_NextItem
, which is not ambiguous about errors.